selective module
InsectMamba: Insect Pest Classification with State Space Model
Wang, Qianning, Wang, Chenglin, Lai, Zhixin, Zhou, Yucheng
The classification of insect pests is a critical task in agricultural technology, vital for ensuring food security and environmental sustainability. However, the complexity of pest identification, due to factors like high camouflage and species diversity, poses significant obstacles. Existing methods struggle with the fine-grained feature extraction needed to distinguish between closely related pest species. Although recent advancements have utilized modified network structures and combined deep learning approaches to improve accuracy, challenges persist due to the similarity between pests and their surroundings. To address this problem, we introduce InsectMamba, a novel approach that integrates State Space Models (SSMs), Convolutional Neural Networks (CNNs), Multi-Head Self-Attention mechanism (MSA), and Multilayer Perceptrons (MLPs) within Mix-SSM blocks. This integration facilitates the extraction of comprehensive visual features by leveraging the strengths of each encoding strategy. A selective module is also proposed to adaptively aggregate these features, enhancing the model's ability to discern pest characteristics. InsectMamba was evaluated against strong competitors across five insect pest classification datasets. The results demonstrate its superior performance and verify the significance of each model component by an ablation study.
RGB-D-based Stair Detection using Deep Learning for Autonomous Stair Climbing
Wang, Chen, Pei, Zhongcai, Qiu, Shuang, Tang, Zhiyong
Stairs are common building structures in urban environments, and stair detection is an important part of environment perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a neural network architecture with RGB and depth map inputs. Specifically, we design a selective module, which can make the network learn the complementary relationship between the RGB map and the depth map and effectively combine the information from the RGB map and the depth map in different scenes. In addition, we design a line clustering algorithm for the postprocessing of detection results, which can make full use of the detection results to obtain the geometric stair parameters. Experiments on our dataset show that our method can achieve better accuracy and recall compared with existing state-of-the-art deep learning methods, which are 5.64% and 7.97%, respectively, and our method also has extremely fast detection speed. A lightweight version can achieve 300 + frames per second with the same resolution, which can meet the needs of most real-time detection scenes.
Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery
Object detection is a canonical task in computer vision, as well as in remote sensing. Object detection in remote sensing imagery deals with detecting instances of visual objects of certain classes, most of which are man-made, buildings, airplanes, ships, vehicles, to name a few. This technology has been widely used in many civilian and military fields, such as port and airport flow monitoring, traffic diversion, urban planning, lost ship search and rescue. Traditional machine learning (ML) schemes based on the encoding of handcrafted features (e.g., textures, color histogram, or more complex HOG Dalal and Triggs (2005), SIFT Lowe (2004), Haar Viola and Jones (2001),ACF Dollár, Appel, Belongie and Perona (2014), etc.) can only generate shallow to middle features with limited representativity. Recently, with the rapid development of deep learning (DL), convolutional neural networks (CNNs) have became a new and powerful approach for feature extraction and greatly improved the performance of object detection. Current CNN-based object detection methods could be roughly divided into two streams: two-stage schemes and one-stage schemes. The two-stage detector, such as R-CNN Girshick, Donahue, Darrell and Malik (2014), Fast R-CNN Girshick (2015), Faster R-CNN Ren, He, Girshick and Sun (2017) and other detectors Cai and Vasconcelos (2018); Pang, Chen, Shi, Feng, Ouyang and Lin (2019); Li, Chen, Wang and Zhang (2019b), divide the detection into localization and recognition stages, having one more region-proposal step than single-stage detectors.